list-decodable linear regression
Batch List-Decodable Linear Regression via Higher Moments
Diakonikolas, Ilias, Kane, Daniel M., Karmalkar, Sushrut, Liu, Sihan, Pittas, Thanasis
We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $\alpha \in (0, 1/2)$, an unknown $\alpha$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size $n$ satisfies $n \geq \tilde{\Omega}(\alpha^{-1})$ and the number of batches is $m = \mathrm{poly}(d, n, 1/\alpha)$, their algorithm runs in polynomial time and outputs a list of $O(1/\alpha^2)$ vectors at least one of which is $\tilde{O}(\alpha^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $\delta>0$, as long as the batch size is $n \geq \Omega_{\delta}(\alpha^{-\delta})$ and the degree-$\Theta(1/\delta)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \mathrm{poly}((dn)^{1/\delta}, 1/\alpha)$ batches, runs in polynomial-time, and outputs an $O(1/\alpha)$-sized list of vectors one of which is $O(\alpha^{-\delta/2}/\sqrt{n})$ close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.
Reviews: List-decodable Linear Regression
Post-rebuttal response: I read the authors' response and don't have any further comments. This paper considers the model of "robust statistics" where an alpha fraction of the training data comes from the ground truth distribution while the rest are corrupted arbitrarily (i.e. Traditionally, research has been on the setting where alpha is large, so that the parameters of the true distribution are information-theoretically identifiable. However, recent focus has been on the small alpha setting. Here, the parameters of the true distribution cannot be uniquely identified even with infinitely many samples.
Reviews: List-decodable Linear Regression
This paper studies the challenging problem of doing linear regression in the setting where an overwhelming fraction (1-alpha) of the examples are adversarially corrupted. It extends recent work on using the Sum-of-Squares hierarchy for robust estimation. The main contribution is realizing that anti-concentration (and being able to certify anti-concentration) is the key. The algorithm has a high running time (d (1/alpha 8)) but given the challenging nature of the problem, the reviewers felt that the fact that the problem can be solved in polynomial time for any fixed alpha 0 is surprising and an important contribution.
List-decodable Linear Regression
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. It outputs a list L of size O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell . Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. In particular, this gives a (d/\alpha) {O(1/\alpha 8)} time algorithm to find a O(1/\alpha) size list when the inlier distribution is a standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that \ell * has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the identifiability to algorithms'' paradigm based on the sum-of-squares method.
Statistical Query Lower Bounds for List-Decodable Linear Regression
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set T of labeled examples (x, y) \in \mathbb{R} d \times \mathbb{R} and a parameter 0 \alpha 1/2 such that an \alpha -fraction of the points in T are i.i.d. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of d {\mathrm{poly}(1/\alpha)} for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.
Statistical Query Lower Bounds for List-Decodable Linear Regression
Diakonikolas, Ilias, Kane, Daniel M., Pensia, Ankit, Pittas, Thanasis, Stewart, Alistair
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0< \alpha <1/2$ such that an $\alpha$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-\alpha)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{\mathrm{poly}(1/\alpha)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.